An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters

نویسنده

  • Stephen A. Ramsey
چکیده

A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5' regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis-Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Allele and Genotype Frequencies of Bovine Pituitaryspecific Transcription Factor and Leptin Genes in IranianCattle and Buffalo Populations Using PCR-RFLP

The use of polymorphic markers in breeding programmes could make selection more accurate and efficient. A total of 324 individuals from six Iranian cattle populations (Sarabi, Golpayegani, Sistani, Taleshi, Mazandarani, Dashtiyari), F1 Golpayegani × Brown Swiss and Iranian buffalo populations were genotypedfor the Pit-1 HinfI and leptin Sau3AI polymorphisms by the polymerase chain reactio...

متن کامل

RBF-TSS: Identification of Transcription Start Site in Human Using Radial Basis Functions Network and Oligonucleotide Positional Frequencies

Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification meth...

متن کامل

Genomic Promoter Analysis Predicts Functional Transcription Factor Binding

Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of thos...

متن کامل

Estimation of the Parameters of the Lomax Distribution using the EM Algorithm and Lindley Approximation

Estimation of statistical distribution parameter is one of the important subject of statistical inference. Due to the applications of Lomax distribution in business, economy, statistical science, queue theory, internet traffic modeling and so on, in this paper, the parameters of Lomax distribution under type II censored samples using maximum likelihood and Bayesian methods are estimated. Wherea...

متن کامل

Mapping of Transcription Factor Binding Region of Kappa Casein (CSN3) Gene in Iranian Bacterianus and Dromedaries Camels

κ-casein is a glycosilated protein in mammalian milk that plays an essential role in the milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. Transcriptional regulation, a first mechanism for controlling the development of organisms, is carried out by transcription facto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2015